Optimization of Latent Semantic Analysis based Language Model Interpolation for Meeting Recognition

نویسندگان

  • Michael Pucher
  • Yan Huang
  • Özgür Çetin
چکیده

Latent Semantic Analysis (LSA) defines a semantic similarity space using a training corpus. This semantic similarity can be used for dealing with long distance dependencies, which are an inherent problem for traditional word-based n-gram models. This paper presents an analysis of interpolated LSA models that are applied to meeting recognition. For this task it is necessary to combine meeting and background models. Here we show the optimization of LSA model parameters necessary for the interpolation of multiple LSA models. The comparison of LSA and cache-based models shows furthermore that the former contain more semantic information than is contained in the repetition of words forms. Optimizacija latentne semantične analize temelječe na interpolaciji jezikovnega modela za namene razpoznavanja sestankov Latentna semantična analiza (LSA) definira prostor semantične podobnosti z uporabo učnega korpusa. To semantično podobnost je mogoče uporabiti pri odvisnostih dolgega dosega, ki so inherenten problem za tradicionalne, na besedah temelječe n-gramske modele. Prispevek predstavlja analizo interpoliranih modelov LSA, ki so uporabljeni za razpoznavanje sestankov. Za to nalogo je potrebno združiti modela sestankov in ozadja. Predstavljena je optimizacija parametrov modela LSA za interpolacijo med večimi modeli LSA. Primerjava modelov LSA in modelov s predpomnilnikom pokaže tudi, da prvi vsebujejo več semantičnih informacij kot ponavljanje besednih oblik.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Similarity in Automatic Speech Recognition for Meetings

This thesis investigates the application of language models based on semantic similarity to Automatic Speech Recognition for meetings. We consider data-driven Latent Semantic Analysis based and knowledge-driven WordNet-based models. Latent Semantic Analysis based models are trained for several background domains and it is shown that all background models reduce perplexity compared to the n-gram...

متن کامل

Unsupervised Latent Speaker Language Modeling

In commercial speech applications, millions of speech utterances from the field are collected from millions of users, creating a challenge to best leverage the user data to enhance speech recognition performance. Motivated by an intuition that similar users may produce similar utterances, we propose a latent speaker model for unsupervised language modeling. Inspired by latent semantic analysis ...

متن کامل

Latent Semantic Analysis based Language Models for Meetings

Language models that combine N -gram models with Latent Semantic Analysis (LSA) based models have been successfully applied for conversational speech recognition [3] and for the Wall Street Journal recognition task [1]. LSA defines a semantic similarity space using a training corpus. This semantic similarity can be used for dealing with long distance dependencies, which are an inherent problem ...

متن کامل

Latent Semantic Modeling and Smoothing of Chinese Language

Language modeling plays a critical role for automatic speech recognition. Typically, the n-gram language models suffer from the lack of a good representation of historical words and an inability to estimate unseen parameters due to insufficient training data. In this study, we explore the application of latent semantic information (LSI) to language modeling and parameter smoothing. Our approach...

متن کامل

A Maximum Entropy Approach for Semantic Language Modeling

The conventional n-gram language model exploits only the immediate context of historical words without exploring long-distance semantic information. In this paper, we present a new information source extracted from latent semantic analysis (LSA) and adopt the maximum entropy (ME) principle to integrate it into an n-gram language model. With the ME approach, each information source serves as a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006